Processing output from bioinformatics pipelines
…BRIEF INTRO IN PROGRESS…
A tentative snakemake workflow that defines data processing rules in a DAG (directed acyclic graph) format. A detailed interactive snakemake HTML report is available here. You will be able to explore the workflow and the associated statistics. You can close the left bar (overlap) to get a more expansive display view.
Processing MOTHUR output
Import library
library(tidyverse, suppressPackageStartupMessages())
Mothur metadata
library(tidyverse, suppressPackageStartupMessages())
Mothur otutable
Mothur taxonomy
Mothur composite
Processing QIIME2 output
QIIME2 metadata
library(tidyverse, suppressPackageStartupMessages())
# QIIME2 metadata
qiime2_metadata <- read_tsv("resources/metadata/qiime2_metadata_file.tsv",
show_col_types = FALSE) %>%
rename(sample_id="sample-id")
QIIME2 otutable
qiime2_otutable <- read_tsv("qiime2_process/transformed/feature-table.tsv", skip = 1, show_col_types = FALSE) %>%
rename(feature='#OTU ID') %>%
select(-starts_with('Mock')) %>%
mutate_at(2:ncol(.), as.numeric) %>%
mutate_if(is.numeric, ~replace(., is.na(.), 0))
qiime2_otutable <- qiime2_otutable %>%
pivot_longer(-feature, names_to = "sample_id", values_to = "count") %>%
relocate(sample_id, .before = feature)
QIIME2 taxonomy
qiime2_taxonomy <- read_tsv("qiime2_process/transformed/taxonomy.tsv", show_col_types=FALSE) %>%
rename(feature="Feature ID") %>%
distinct() %>%
mutate_if(is.numeric, ~replace(., is.na(.), 0)) %>%
mutate(Taxon = str_replace_all(Taxon, "; s__$", ""),
Taxon = str_replace_all(Taxon, "; g__$", ""),
Taxon = str_replace_all(Taxon, "; f__$", ""),
Taxon = str_replace_all(Taxon, "; o__$", ""),
Taxon = str_replace_all(Taxon, "; c__$", ""),
Taxon = str_replace_all(Taxon, "; p__$", ""),
Taxon = str_replace_all(Taxon, "; k__$", ""),
Taxon = str_replace_all(Taxon, "\\[|\\]", ""),
Taxon = str_replace_all(Taxon, "\\s", "")) %>%
dplyr::filter(!grepl("s__*", Taxon)) %>%
dplyr::filter(grepl("g__*", Taxon)) %>%
select(-Confidence) %>%
mutate(Taxon = str_replace_all(Taxon, "\\w__", "")) %>%
separate(Taxon, into = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus"), sep = ";")
QIIME2 composite
qiime2_composite <- inner_join(qiime2_metadata, qiime2_otutable, by = "sample_id") %>%
inner_join( ., qiime2_taxonomy, by = "feature") %>%
group_by(sample_id) %>%
mutate(rel_abund = count/sum(count)) %>%
ungroup() %>%
mutate_if(is.numeric, ~replace(., is.na(.), 0)) %>%
relocate(count, .before = rel_abund)
write_tsv(qiime2_composite, "qiime2_process/transformed/qiime2_composite.tsv")
Citation
Please consider citing the iMAP article[1] if you find any part of the IMAP practical user guides helpful in your microbiome data analysis.
References
[1]
Buza, T. M., Tonui, T., Stomeo, F., Tiambo, C.,
Katani, R., Schilling, M., … Kapur, V. (2019). iMAP: An integrated
bioinformatics and visualization pipeline for microbiome data analysis.
BMC Bioinformatics, 20. https://doi.org/10.1186/S12859-019-2965-4
Appendix
Project main tree
.
├── LICENSE.md
├── README.md
├── UntitledRMD.Rmd
├── config
│  ├── config.yml
│  ├── pbs
│  ├── samples.tsv
│  ├── slurm
│  └── units.tsv
├── css
│  └── styles.css
├── dags
│  ├── rulegraph.png
│  └── rulegraph.svg
├── data
│  ├── qiime2_composite.csv
│  ├── qiime2_tidy_metadata.csv
│  ├── qiime2_tidy_otutable.csv
│  └── qiime2_tidy_taxonomy.csv
├── images
│  ├── bkgd.png
│  ├── coders.png
│  ├── preprocess.png
│  ├── processdata.png
│  └── smkreport
├── index.Rmd
├── library
│  ├── apa.csl
│  ├── imap.bib
│  └── references.bib
├── report.html
├── resources
├── results
│  └── project_tree.txt
├── styles.css
└── workflow
├── Snakefile
├── envs
├── rules
├── schemas
├── scripts
└── workflow.Rproj
17 directories, 26 files
Troubleshooting of FAQs
- Question
- Question
-
Answer
-
Answer